Introduction to ggplot2

Emorie D Beck

Visualizing Data in ggplot2

What is ggplot2 trying to do?

  • Create a grammar of graphics
  • Aims to help draw connections across diverse plots
  • Create order in the chaos of complicated plots

From Wickham (2010):

A grammar of graphics is a tool that enables us to concisely describe the components of a graphic.

What are the core elements of ggplot2 grammar?

  • Mappings: base layer
  • Scales: control and modify your mappings
  • Geoms: plot elements
  • Facets: panel your plot
  • Grobs: things that aren’t geoms that we want to layer on like text, arrows, other things
  • Themes: style your figure

But first, our data

  • These are some Experience Sampling Method data I collected during my time in graduate school
  • Specifically, these include data from Beck & Jackson (2022)
  • In that paper I built personalized machine learning models of behaviors and experiences from sets of:
    • psychological
    • situational
    • and time variables
load(url("https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/02-week2-ggplot2/01-data/ipcs_data.RData"))
ipcs_data %>% 
  print(n = 6)
# A tibble: 4,222 × 70
  SID   Full_D…¹ afraid angry atten…² content excited goaldir guilty happy proud
  <chr> <chr>     <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl> <dbl>
1 02    2018-10…      1     2       4       4       2       5      2     3     4
2 02    2018-10…      1     1       4       3       2       5      1     3     3
3 02    2018-10…      2     1       2       3       1       2      2     3     2
4 02    2018-10…      2     2       4       3       2       4      1     3     3
5 02    2018-10…      2     1       4       4       3       4      1     3     3
6 02    2018-10…      2     1       4       4       2       4      1     3     3
# … with 4,216 more rows, 59 more variables: purposeful <dbl>,
#   agreeableness_Compassion <dbl>, agreeableness_Respectfulness <dbl>,
#   agreeableness_Trust <dbl>, conscientiousness_Organization <dbl>,
#   conscientiousness_Productiveness <dbl>,
#   conscientiousness_Responsibility <dbl>, extraversion_Assertiveness <dbl>,
#   extraversion_Energy.Level <dbl>, extraversion_Sociability <dbl>,
#   neuroticism_Anxiety <dbl>, neuroticism_Depression <dbl>, …

Mappings

  • The first thing we call with ggplot2 is always the ggplot() function, which has two core arguments:
    • data: your data object (can also be piped)
ggplot(
  data = ipcs_data
)

Mappings

  • The first thing we call with ggplot2 is always the ggplot() function, which has two core arguments:
    • data: your data object (can also be piped)
    • mapping: you aesthetic mappings for the plot, wrapped in aes()
ggplot(
  data = ipcs_data
  , mapping = aes(x = linear, y = happy)
)

Mappings

  • How many aesthetic mappings are there?
  • x
  • y
  • col / color
  • fill
  • shape
  • size
  • linetype
  • xmin / xmax
  • ymin / ymax
  • alpha
  • There are lots of geom-specific ones, too
  • Use ?geom_whatever() to get more info on a specific geom

Scales

  • Every mapping is a scale
  • Scales can be lots of different things
  • In ggplot2 language, some core ones are:
  • continuous
  • discrete
  • manual
  • ordinal
  • binned
  • date
  • brewer
  • All of these have specific arguments based on the type of scale

Scales

continuous

  • Let’s try the continuous scale with our y mapping
  • We’ll use the following three arguments
    • limits: vector length 2
    • breaks: vector of any length
    • labels: numeric or character vector
ggplot(
  data = ipcs_data
  , mapping = aes(x = linear, y = happy)
) + 
  scale_y_continuous(
    limits = c(1, 5)
    , breaks = seq(1, 5, by = 2)
    , labels = c("one", "three", "five")
  )

Geoms

  • We’ll loop back to scales after talking about some geoms
  • It’s not possible to go through all the possible geoms (that’s what the rest of the class is for!)
  • We’ll focus on some basic ones for now
  • geom_point()
  • geom_jitter()
  • geom_smooth()
  • geom_hline() / geom_vline()
  • geom_bar()
  • geom_boxplot()
  • geom_density()
  • geom_histogram()

Geoms: geom_point()

Your basic scatterplot!

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(
    x = purposeful
    , y = happy
    )) + 
    geom_point() + 
    theme_classic() # I just hate grey backgrounds

Geoms: geom_point()

Your basic scatterplot!

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(
    x = purposeful
    , y = happy
    , color = happy
    )) + 
    geom_point() + 
    theme_classic() # I just hate grey backgrounds

Geoms: geom_point()

Your basic scatterplot!

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(
    x = purposeful
    , y = happy
    , color = happy
    )) + 
    scale_color_continuous(type = "viridis") +
    geom_point() + 
    theme_classic() # I just hate grey backgrounds

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(
    x = purposeful
    , y = happy
    , color = happy
    )) + 
    scale_color_gradient(low = "red", high = "blue") +
    geom_point() + 
    theme_classic() # I just hate grey backgrounds

Geoms: geom_jitter()

  • Sometimes we have data that have lots of repeating values, especially with ordinal response scales where the variables aren’t composited / latent
  • jitter adds random noise to the point to allow you to see more of the points
ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(
    x = purposeful
    , y = happy
    )) + 
    geom_jitter() + 
    theme_classic() # I just hate grey backgrounds

This may be too much jitter

Geoms: geom_jitter()

  • Sometimes we have data that have lots of repeating values, especially with ordinal response scales where the variables aren’t composited / latent
  • jitter adds random noise to the point to allow you to see more of the points
ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(
    x = purposeful
    , y = happy
    )) + 
    geom_jitter(width = .1, height = .1) + 
    theme_classic() # I just hate grey backgrounds

Geoms: geom_jitter()

alpha

Alpha can help us understand how many points are stacked when using jitter (or other overlapping data)

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(
      width = .1
      , height = .1
      , alpha = .25
      ) + 
    theme_classic() # I just hate grey backgrounds

Geoms: geom_smooth()

  • geom_smooth() allows you to apply statistical functions to your data
  • There are other ways to do this that we won’t cover today
  • Core arguments are:
    • method: “loess”, “lm”, “glm”, “gam”
    • formula: e.g., y ~ x or y ~ poly(x, 2)
    • se: display standard error of estimate (T/ F)
    • aes() wrapped aesthetics or directly mapped aesthetics

Geoms: geom_smooth()

Remember: it’s a LAYERED grammar of graphics, so let’s layer!

method = "lm"

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_smooth(
      method = "lm"
      , formula = y ~ x
      , se = F
      , color = "blue"
    ) + 
    theme_classic() # I just hate grey backgrounds

method = "loess"

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_smooth(
      method = "loess"
      , formula = y ~ x
      , se = F
      , color = "blue"
    ) + 
    theme_classic() # I just hate grey backgrounds

Geoms: geom_smooth()

Remember: it’s a LAYERED grammar of graphics, so let’s layer!

method = "lm"

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_smooth(
      method = "lm"
      , formula = y ~ x
      , se = T
      , color = "blue"
    ) + 
    theme_classic() # I just hate grey backgrounds

method = "loess"

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_smooth(
      method = "loess"
      , formula = y ~ x
      , se = T
      , color = "blue"
    ) + 
    theme_classic() # I just hate grey backgrounds

Geoms:

geom_hline()/geom_vline()

  • Sometimes, we will want to place lines at various intercepts
  • We’ll get into specific use cases as the course progresses
  • geom_hline(): horizontal lines have yintercept mappings
  • geom_vline(): vertical lines have xintercept mappings

Geoms:

geom_hline()/geom_vline()

geom_hline(): horizontal lines have yintercept mappings

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_hline(
      aes(yintercept = mean(happy, na.rm = T))
      , linetype = "dashed"
      ) + 
    geom_smooth(method = "lm", formula = y ~ x, se = F, color = "blue") + 
    theme_classic() # I just hate grey backgrounds

geom_vline(): vertical lines have xintercept mappings

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_vline(
      aes(xintercept = mean(purposeful, na.rm = T))
      , linetype = "dashed"
      ) + 
    geom_smooth(method = "lm", formula = y ~ x, se = F, color = "blue") + 
    theme_classic() # I just hate grey backgrounds

Geoms: geom_bar()

  • Bar graphs can be useful for showing relative differences
  • My hot take is that they are rarely that useful
    • (This is mostly because of how we perceive errorbars and differences, which we’ll talk more about in a few weeks!)
  • But let’s look at using them for frequency and means / se’s

Geoms: geom_bar()

Frequency

How often did our participant have an argument, interact with others, study, and feel tired?

ipcs_data %>%
  filter(SID == "216") %>% 
  select(SID, Full_Date, argument, interacted, study, tired) %>%
  pivot_longer(
    cols = argument:tired
    , names_to = "item"
    , values_to = "value"
    , values_drop_na = T
  ) %>%
  group_by(item) %>%
  summarize(value = sum(value == 1)) %>%
  ggplot(aes(x = item, fill = item, y = value)) + 
    geom_col(color = "black") + 
    theme_classic()

Geoms: geom_bar()

Mean differences

Were there mean-level in our continuous variables?

ipcs_data %>%
  filter(SID %in% c("216")) %>%
  select(SID, Full_Date, happy, purposeful, afraid, attentive) %>%
  pivot_longer(
    cols = c(-SID, -Full_Date)
    , names_to = "item"
    , values_to = "value"
    , values_drop_na = T
  ) %>%
  group_by(item) %>%
  summarize(
    mean = mean(value)
    , ci = 1.96*(sd(value)/sqrt(n()))
    ) %>%
  ggplot(aes(x = item, fill = item, y = mean)) + 
    geom_col(color = "black") + 
    geom_errorbar(
      aes(ymin = mean - ci, ymax = mean + ci)
      , position = position_dodge(width = .1)
      , width = .1
      , stat = "identity"
    ) + 
    theme_classic()

Geoms: geom_boxplot()

  • Sometimes called box and whisker plots
  • A method for summarizing a distribution of data without showing raw data
  • Box instead shows 25th, 50th, and 75th percentile (quartiles)
  • Whiskers show 1.5 * interquartile range (75%tile-25%tile)
  • More fun when we want to compare distributions across variables (IMO)

Geoms: geom_boxplot()

One boxplot

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(aes(y = SID, x = happy)) + 
    geom_boxplot(width = .5) + 
    theme_classic()

Geoms: geom_boxplot()

Multiple boxplots

  • We can compare multiple participants
  • Later, we’ll also talk about how to order the boxplots (and other axes) by means, medians, etc.
ipcs_data %>%
  filter(SID %in% c("216", "211", "174", "150", "171")) %>%
  ggplot(aes(
    y = SID, x = happy
    , fill = SID
    )) + 
    geom_boxplot(width = .5) + 
    theme_classic()

Geoms: geom_boxplot()

Multiple boxplots

  • We can also compare multiple variables
ipcs_data %>%
  filter(SID %in% c("216")) %>%
  select(SID, Full_Date, happy, purposeful, afraid, attentive) %>%
  pivot_longer(
    cols = c(-SID, -Full_Date)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(
    y = item
    , x = value
    , fill = item
    )) + 
    geom_boxplot(width = .5) + 
    theme_classic()

Geoms: geom_boxplot()

Advanced!

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    scale_x_continuous(limits = c(1,7), breaks = seq(1,5,2)) + 
    scale_y_continuous(limits = c(1,7), breaks = seq(1,5,2)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_boxplot(aes(
      x = 6
      , y = happy
      )) + 
    geom_boxplot(aes(
      y = 6
      , x = purposeful
      )) +
    geom_smooth(
      method = "lm"
      , formula = y ~ x
      , se = F
      , color = "blue"
      ) + 
    theme_classic() # I just hate grey backgrounds

Geoms:

geom_histogram() & geom_density()

  • Useful for showing raw / smoothed distributions of data

Histogram

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(aes(y = happy)) + 
    geom_histogram(
      fill = "grey"
      , color = "black"
    ) + 
    coord_flip() +
    theme_classic()

Density Distribution

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(aes(x = happy, y = ..density..)) + 
    geom_histogram(
      fill = "grey"
      , color = "black"
    ) + 
    theme_classic()

Geoms:

geom_histogram() & geom_density()

Multiple histograms / density distributions

  • We can compare multiple participants
ipcs_data %>%
  filter(SID %in% c("216", "211", "174", "150", "171")) %>%
  ggplot(aes(
     y = happy
    , fill = SID
    )) + 
    geom_density(alpha = .2) + 
    coord_flip() +
    theme_classic()

Plot Appearance Beyond Mappings: Facets and Themes

Plot Appearance

  • So far, we have only changed appearance via the scale_() functions
  • But that doesn’t change things like axis, text, title, and more
  • Nor does it help when we want to split the plot into multiple panels
  • Let’s do those next!

Plot Apperance: Facets

  • Often, we have lots of other reasons we need to reproduce the same plot multiple times
    • multiple variables
    • multiple people
    • multiple conditions
    • etc.
  • There are more ways to do this than we’ll cover today, like piecing plots together and more

Appearance: Facets

  • The core of directly faceting within ggplot is that you have to facet according to variables in your data set
  • This is part of why we covered moving your data to long
  • Say that you want to facet by variable, for example, but your data is in wide form
    • Facets couldn’t handle that
ipcs_data %>%
  filter(SID == "216") %>%
  select(SID, beep, afraid:content) %>%
  pivot_longer(
    cols = afraid:content
    , names_to = "item"
    , values_to = "value"
  ) %>%
  print(n = 8)
# A tibble: 432 × 4
  SID    beep item      value
  <chr> <int> <chr>     <dbl>
1 216       1 afraid        1
2 216       1 angry         2
3 216       1 attentive     3
4 216       1 content       4
5 216       2 afraid        2
6 216       2 angry         2
7 216       2 attentive     3
8 216       2 content       4
# … with 424 more rows

Appearance: Facets

  • In ggplot2, there are two core faceting functions
    • facet_grid()
    • facet_wrap()
  • facet_grid()
    • snaps figures in a grid; no wrapping
    • especially useful for 1-2 faceting variables
  • facet_wrap()
    • treats each facet a separate
    • wraps according to nrow and ncol arguments

Appearance: Facets

facet_grid()

Core arguments:

  • rows, cols: list of variables or formula, e.g., x ~ y
  • scales: same x or y scale on all facets?
  • space: same space for unequal length x or y facets?
  • switch: move labels from left to right or top to bottom?
  • drop: drop unused factor levels

facet_wrap()

Core arguments:

  • facets: barequoted or one-sided formula, e.g., ~ x + y
  • nrow / ncol: number of rows and columns
  • scales: same x or y scale on all facets?
  • switch: move labels from left to right or top to bottom?
  • drop: drop unused factor levels
  • dir: horizontal or vertical
  • strip.position: where to put the labels

Appearance: Facets

facet_grid()

ipcs_data %>%
  filter(SID == "216") %>%
  select(SID, beep, afraid:content) %>%
  pivot_longer(
    cols = afraid:content
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value, group = item)) + 
    geom_point() + 
    geom_line() + 
    facet_grid(item~.) +
    theme_classic()

Appearance: Facets

facet_wrap()

ipcs_data %>%
  filter(SID == "216") %>%
  select(SID, beep, afraid:content) %>%
  pivot_longer(
    cols = afraid:content
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value, group = item)) + 
    geom_point() + 
    geom_line() + 
    facet_wrap(
      ~item
      , ncol = 1
      , strip.position = "right"
      ) +
    theme_classic()

Appearance: Facets

facet_wrap(): Change scale and space

ipcs_data %>%
  filter(SID == "216") %>%
  select(SID, beep, afraid:content) %>%
  pivot_longer(
    cols = afraid:content
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value, group = item)) + 
    geom_point() + 
    geom_line() + 
    facet_grid(
      item ~ . 
      , scales = "free_y"
      , space = "free_y"
      ) +
    theme_classic()

Appearance: Labels & Titles

  • APA style says titles are bad
  • Common sense says titles help understanding
  • Ask for forgiveness, not permission

Appearance: Labels & Titles

Remember this?

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_smooth(
      method = "lm"
      , formula = y ~ x
      , se = T
      , color = "blue"
    ) + 
    theme_classic() # I just hate grey backgrounds

Appearance: Labels & Titles

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_smooth(
      method = "lm"
      , formula = y ~ x
      , se = T
      , color = "blue"
    ) + 
    labs(
      x = "Momentary Purpose (1-5)"
      , y = "Momentary Happiness (1-5)"
      , title = "Zero-Order Associations 
                  Between Momentary Happiness and Purpose"
    ) + 
    theme_classic() # I just hate grey backgrounds

Appearance: Labels & Titles

Labels also apply to other mappings like color

ipcs_data %>%
  filter(SID %in% c("216", "211", "174", "150", "171")) %>%
  ggplot(aes(
     y = happy
    , fill = SID
    )) + 
    geom_density(alpha = .2) + 
    coord_flip() +
    labs(
      x = "Momentary Happiness"
      , y = "Smoothed Density"
      , fill = "Participant\nID"
    ) + 
    theme_classic()

Appearance: Labels & Titles

You can also use labels to remove axis labels

ipcs_data %>%
  filter(SID %in% c("216")) %>%
  select(SID, Full_Date, happy, purposeful, afraid, attentive) %>%
  pivot_longer(
    cols = c(-SID, -Full_Date)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(
    y = item
    , x = value
    , fill = item
    )) +
    geom_boxplot(width = .5) + 
    labs(
      x = "Momentart Rating (1-5)"
      , y = NULL
      , fill = "Item"
    ) + 
    theme_classic()

Appearance: Themes (Basic)

  • There are lots of themes you can use in ggplot that are pre-built into the package
  • Try tying theme_ into your R console, and look at the functions that pop up
  • Some stand-out ones are:
    • theme_classic() (what we’ve been using)
    • theme_bw()
    • theme_minimal() (but is there a theme_maximal?)
    • theme_void

Appearance: Themes (Advanced)

  • Custom themes are one of the best ways to “hack” your ggplots
  • You will not remember all of them
  • You will have to google them all time
  • Here’s the site: https://ggplot2.tidyverse.org/reference/theme.html
  • Rather than give details on a bunch of these, I’m going to demonstrate theme modifications I often use

Appearance: Themes

ipcs_data %>%
  filter(SID == "216") %>%
  ggplot(mapping = aes(x = purposeful, y = happy)) + 
    geom_jitter(width = .1, height = .1, alpha = .25) + 
    geom_smooth(
      method = "lm"
      , formula = y ~ x
      , se = T
      , color = "blue"
    ) + 
    labs(
      x = "Momentary Purpose (1-5)"
      , y = "Momentary Happiness (1-5)"
      , title = "Zero-Order Associations Between Momentary Happiness and Purpose"
    ) + 
    theme_classic() + 
    theme(
      plot.title = element_text(
        face = "bold"
        , size = rel(1.1)
        , hjust = .5
        )
      , axis.title = element_text(
        face = "bold"
        , size = rel(1.1)
        )
      , axis.text = element_text(
        face = "bold"
        , size = rel(1.2)
        )
    )

Appearance: Themes

ipcs_data %>%
  filter(SID %in% c("216")) %>%
  select(SID, Full_Date, happy, purposeful, afraid, attentive) %>%
  pivot_longer(
    cols = c(-SID, -Full_Date)
    , names_to = "item"
    , values_to = "value"
    , values_drop_na = T
  ) %>%
  group_by(item) %>%
  summarize(
    mean = mean(value)
    , ci = 1.96*(sd(value)/sqrt(n()))
    ) %>%
  ggplot(aes(x = item, fill = item, y = mean)) + 
    geom_col(color = "black") + 
    geom_errorbar(
      aes(ymin = mean - ci, ymax = mean + ci)
      , position = position_dodge(width = .1)
      , width = .1
      , stat = "identity"
    ) + 
    labs(
      x = NULL
      , y = "Mean Momentary Rating (CI)\n[Range 1-5]"
      , title = "Descriptive Statistics of Momentary Emotion Ratings"
    ) + 
    theme_classic() + 
    theme(
      legend.position = "none"
      , plot.title = element_text(face = "bold", size = rel(1.1), hjust = .5)
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", size = rel(1.2))
  )

Appearance: Themes

ipcs_data %>%
  filter(SID == "216") %>%
  select(SID, beep, afraid:content) %>%
  pivot_longer(
    cols = afraid:content
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value, group = item)) + 
    geom_line(aes(color = item)) + 
    geom_point(size = 1) + 
    facet_grid(item~.) +
    labs(
      x = "ESM Beep (#)"
      , y = "Rated Momentary Value (1-5)"
      , title = "Time Series of Four Momentary Emotion Items for Participant 216"
      , color = NULL
      ) + 
    theme_classic() + 
    theme(
      legend.position = "bottom"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", size = rel(1.1), hjust = .5)
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", size = rel(1.2))
      , strip.background = element_rect(color = "black", fill = "cornflowerblue")
      , strip.text = element_text(face = "bold", size = rel(1.2), color = "white")
    )